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FOREWORD 


The  methods  of  analysis  of  extr erne-value  data  developed 
in  the  present  report  will  he  useful  in  the  economical 
handling  of  large  amounts  of  extreme  gust-load  data  for 
airplanes  In  flight 0 These  methods  will  in  many  cases  give 
better  results  than  procedures  used  previously  which  required 
a much  larger  amount  of  calculation 

The  development  of  these  newer  methods  is  one  phase  of 
a project  aimed  at  the  improved  application  of  the  theory  of 
extreme  values  to  the  analysis  of  gust  loads  of  airplanes* 

This  research,  carried  out  by  Mr*  Julius  Lieblein  under  the 
general  supervision  of  Dr*  Churchill  Eisenhart,  Chief  of  the 
Statistical  Engineering  Laboratory,  was  supported  by  the 
national  Advisory  Committee  for  Aeronautics*  The  Statistical 
Engineering  Laboratory  is  Section  11.3  of  the  National  Applied 
Mathematics  Laboratories  (Division  11,  National  Bureau  of 
Standards),  and  is  concerned  with  the  development  and  applica- 
tion of  modern  statistical  methods  in  the  physical  sciences 
and  engineering* 
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Chief,  National  Applied 
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ON  CERTAIN  ESTIMATORS  BASED  OH  LARGE  SAMPLES  OF  EXTREMES 


Julius  Lieblein* 


I,  SUMMARY 


Statistical  techniques  are  developed  and  applied  for 
the  economical  handling  of  large  masses  of  extreme  gust- 
load data,  These  methods  are  especially  well  adapted  for 
punch-card  equipment,  requiring  essentially  just  a counting 
sorter.  The  efficiencies  of  the  proposed  methods  when  used 
with  large  samples  are  in  some  cases  superior  to  the  effi- 
ciencies  of  certain  methods  proposed  by  E,  J,  Gumbel  which 
have  been  used  in  an  NACA  publication  (reference  1)* 

This  report  is  limited  to  methods  applicable  to  large 
samples.  The  principal  technique  is  the  use  of  simple  sums 
and  differences  of  order  statistics.  If  the  n values  in  a 
sample  are  arranged  in  (say)  ascending  order,  they  are 
called  order  statistics.  If  the  data  may  be  assumed  to  come 
from  a distribution  of  extreme  values,  these  order  statis- 
tics can  be  used  in  estimating  the  parameters  of  the  fitted 
distribution.  This  report  describes  and  gives  the  effici- 
encies of  rapid  methods: 

(i)  for  estimating  the  population  mode  when  the  dis- 
persion is  known  by  using  1,  2,  3,  4,  or  5 
suitably  chosen  order  statistics  from  the  sample 
of  n; 

(ii)  for  estimating  the  population  standard  deviation 
by  using  2 or  4 suitable  order  statistics* 

These  methods  are  summarized  in  Table  I;  their  use  is  ex- 
plained in  the  text  of  this  report,  and  listed  in  condensed 
form  under  Conclusions e 


* Statistical  Engineering  Laboratory,  National  Bureau  of 
Standards.  Preparation  of  this  paper  was  sponsored  by 
the  National  Advisory  Committee  for  Aeronautics. 
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II o INTRO DUCT ION 

The  purpose  of  this  report  is  to  contribute  some  results 
on  the  use  of  order  statistics  in  the  analysis  of  large 
bodies  of  extreme  gust~load  data®  Reference  1 indicates 
that  the  distribution  of  extreme  values  of  the  fora 
F(x)  33  exp  [ -e"^x“u)] 

35  Probability  that  an  observed  value  is  < 
is  applicable  to  problems  of  predicting  the  frequency  of  en- 
countering very  severe  gust  loads  and  gust  velocities  under 
certain  test  and  operating  conditions * Applicability  of  the 
extreme-value  distribution  will  therefor©  be  assumed  for 
purposes  of  this  report® 

The  problem  is  then  to  fit  an  extreme-value  distribution 
to  a sample  of  data  in  order  to  provide  a basis  for  analysis 
and  prediction®  A method  of  fitting  by  estimating  the  two 
parameters,  u (the  mode)  and  1/a  (the  scale  factor  or  disper- 
sion), has  been  proposed  by  E®  I®  Gumbel (reference  2,  lecture 
3,  pc  18) « As  a result  of  preliminary  analysis  it  appears 
that  this  method  may  have  low  efficiency  relative  to  the  best 
efficiency  possible,  efficiency  being  measured  relative  to 
the  number  of  observations  necessary  to  assure  a specified 
degree  of  precision,. 

It  is  therefore  desirable  to  investigate  alternative 
methods  of  estimation  for  possible  improvement  in  efficiency® 
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To  -avoid  ambiguity,  it  is  essential  to  distinguish 
clearly  between  an  estimator  and  an  estimate 0 An  estimator 
of  a parameter  is  a mathematical  function  of  sample  values 
used  to  estimate  the  parameter.  An  estimate  is  the  numeri- 
cal value  of  this  function  obtained  by  substituting  into  it 
the  values  in  an  actual  sample  of  observations. 

One  type  of  approach  has  been  given  by  F.  Mosteller 
(reference  3)  for  large  samples  from  a normal  population. 

If  the  n sample  values  are  arranged  in  (say)  ascending  orders 

X1 * x2 9 0 * • * *n  * 

they  ar8  called  order  statistics  with  ranks  1,  2,  ...n,  res- 
pectively. Mosteller  has  found  a method  of  choosing  in 
advance  k of  the  n ranks,  k * 1,  2,  . ..,  10,  such  that  the 
correspondingly  spaced  order  statistics  serve  to  build  an 
estimator  for  the  mean  of  a normal  distribution  which  is  up 
to  95  percent  efficient,  and  an  estimator  for  the  standard 
deviation  which  is  up  to  T5  percent  efficient. 

The  methods  used  by  Mosteller  have  been  adapted  and  ex- 
tended in  this  report  to  give  results  for  extreme ~ value 
samples  analogous  to  results  he  obtained  for  samples  from  a 
normal  distribution. 

The  big  advantage  in  the  use  of  order  statistics  for 
large  samples  is  that  three  or  fever  of  the  n order  statis- 
tics can  give  as  much  precision  as  the  use  of  all  m values 

treated  by  ordinary  methods.  For  example,  for  n * 1000,  the 

- •» 

200th  value  in  order  of  size  estimates  the  mode  of  the 
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population  with  greater  precision  than  th©  average  of  all 
1000  values®  For  large  samples  the  appropriate  ©ample 
values  can  readily  be  selected  from  the  data  by  mechanical 
sorting,  provided  the  data  are  available  in  complete  detail,, 
before  grouping  or  other  processing® 


IXXo  SYMBOLS 


x random  variable 

F(x)  cumulative  probability  function 

t{x)  density  function,,  derivative  of  F(x) 

a scale  parameter  ©f  distribution  of  largest  values 
P same  as  1 fa 

u mode  of  distribution  of  largest  values 

n„  N number  of  observations  in  a sample 

xi,  xgo  « • * , xn  values  in  a sample  of  size  n arranged  in 

increasing  order 

k number  of  order  statistics  used  in  estimating  a 

parameter 

u any  estimator  of  parameter  u;  also  used  for  estimator 

x - C p 

a 

p any  estimator  of  parameter  p 

u&  estimator  of  parameter  u formed  from  k order  statistics 

f & estimator  of  parameter  p formed  from  k order  statistics 

u*  order  statistic  estimator  of  u when  p is  unknown 
Uq  Gumbel^s  estimator  of  u when  p is  unknown 

v^  unbiased  estimator  of  u formd  from  k order  stotistics 
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E (T)  expected  value  of  any  estimator  T 
cr2(T)  or  V(T)  variance  of  T 

Xi  - n^/n,  ratio  of  rank  of  order  statistic  to  sample 

size  n 


bias  in  equidistant  method  of  spacing  order  statistics 
in  sample  of  N largest  values 


Bjq-  bias  in  Gumbel’s  plotting  method  for  sample  of  N 

largest  values 

Xj_  ^ iife.  order  statistic  in  sample  of  k values  from 
9 extreme-value  distribution  with 

parameters  a,  u 

order  statistic  in  sample  of  k values  from 
9 extreme-talue  distribution  with 

parameters  1,  u * 0v 

C Euler’s  number,  Q*57721566o . • • 


c 


s 


correction  factor  for  unbiased  estimate  of  disper- 
sion parameter 


standard  deviation  of  observed  sample  values; 


IV,  EXTREME-VALUE  PARAMETERS 

The  extreme-value  distribution  to  be  fitted  to  a sample 
of  data  has  the  form 

(1)  F(x)  = exp  [ -e"a(l_u)] 

= Probability  that  an  observed  value  is  < x,  x 
This  is  the  cumulative  form  of  the  distribution*  The  fre- 
quency, or  density,  form  is  given  by  the  derivative 
f (x)  = F* (x)  s a exp  [ - a (x-u)  - e~  a (x*u)  ] * 

This  function  has  the  general  shape  shown  in  Figure  1,  which 
is  scaled  so  that  a = 1 and  u = 0o  The  function  F(x) 
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represents  the  area  under  the  curve  f(x)  to  the  lef  of  the 
ordinate  drawn  at  the  point  x» 

Fitting  the  extreme-value  distribution  consist  in 
estimating  the  parameters  a and  u from  the  data  in  and,  or 
of  .estimating  one  of  these  parameters  when  the  othe  is 
known  from  previous  data*  These  parameters  have  me  nings 
analogous  to  the  standard  deviation  cr  and  mean  (or  t ode ) “p. 
in  the  case  of  the  normal  distribution 


gU)  . expt-  ] . 

a ffzrr  2 

The  quantity  l/a9  like  cr , is  a scale  parameter  meas  "ing 


dispersion  about  the  central  value;  u is  a location  param- 
eter analogous  to  p,  of  the  normal  distribution,  and  lesig- 


nates  the  most  probable  value  or  mode.  The  mean  of  the 
extreme-value  distribution  is 


(2)  E(x)  - u + 2.  , 

a 

where  C is  Euler^s  number,  0*57721566. , * Since  it  ;\s  more 
natural  to  deal  with  l/a  rather  than  we  put 

p * i/« 

and  consider  p as  the  parameter  to  be  estimated*  p uid  u 
will  also  be  referred  to  as  extreme- value,  or  extra:  -tl, 
parameters* 


V.  ESTIMATION  OF  MODE  u WHEN  p IS  KNOWN 


Estimation  of  the  mode  in  the  case  where  the  other 
parameter,  (3,  is  known,  consists  in  choosing  a specified  k 
of  the  sample  values  x1?  x2,  • ••,  Xq  and  computing  their* 
mean : 


/■>  _ i 2^ 

% = E frixn1 

where  is  the  estimator  of  u formed  with  the  k order  stat< 


i sties  Xq  » xn0 * •••? 

1 c.  i£ 

the  k values  n^,  n2, 


The  problem  is  how  to  select 


nk 


Three  methods  of  selecting,  or  spacing,  are  considered 
in  this  report:  (i)  "optimum"  spacing,  which  gives  a more 
efficient  estimator  than  any  other  method  of  spacing;  (ii) 
equidistant  spacing,  used  in  E.  I.  G-umbelfs  graphical  method 
of  analysis  (reference  1,  Lecture  2,  pp.  20-21);  and  (iii)  a 
method  of  equating  expected  values  analogous  to  one  advocated 
by  B.  F.  Kimball  (reference  5).  These  estimators  are  com- 
pared below  with  each  other  and  with  an  estimator  given  by 
Cxumbei  which  uses  the  mean  of  all  sample  values. 

1.  "Optimum"  spacing. 

a.  Use  of  one  sample  value.  From  the  expression  for 
F(x)  in  equation  (1)  we  see  that  the  probability  that  a 
value  is  less  than  the  mode,  x = u,  is 

F(u)  = e-1  = 0.36788  . 


Hence  in  a sample  of,  say,  1,000,  we  would  expect  the 
368th  value  in  order  of  size  to  give  a good  estimate  of 
u,  in  general,  for  n large,  our  first  estimator  of  the 
mode  u is 

^1  = Xnx  “ x,368n 

where 

u i denotes  an  estimator  of  u using  one  sample  value  ; 
X4368n  denotes  the  ordered  sample  value  whose  rank 
is  „368  times  sample  size  n. 

A 

The  quantity  u^  is  a statistic  whose  value  varies 
from  sample  to  sample . Its  distribution  is  approximately 
normal  for  n large  — the  larger  the  n the  closer  the 
approximation  — with  (asymptotically) f1  ] 

(3)  Mean  U]_  = E(u]_)  = u 

(4)  variance  of  ui  = crg(u  ) * 

1 1 n[f(u)3s 

where 

X = F(u)  = “ 

flu)  - dF(x)1  B asl0 
v ' dx  Jx«u  e pe 


Unless  otherwise  indicated,  all  formulas  in  this  report 
involving  characteristics  of  estimators  such  as  mean, 
variance,  bias,  efficiency,  will  be  understood  to  hold  in 
an  asymptotic  sense  only,  i.e®  in  the  limit  as  sample 
size  n becomes  indefinitely  large.  But  for  all  finite  n, 
however  large,  the  relationships  are  to  be  regarded  as 
merely  approximate,  the  approximation  being  better  (in 
general)  the  larger  the  sample  size. 
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Since  the  average  value  of  u^in  repeated  sampling  is  u, 

• A 

the  parameter  which  is  estimated,  we  say  that  uq  is 
(asymptotically)  an  unbiased 3 estimator  The  variance 
o^(ui)  (also  denoted  by  V(uq) ) is 

a2  (up  - ia^Up8  . 

In  this  relation  the  variance  on  the  left  measures  the 


degree  of  precision  obtainable  from  a sample  of  n obser- 
vations, and  the  equation  shows  that  precision  is  inverse- 


ly proportional  to  the  sample  size. 

An  inequality  (due  to  Cramer  and  Rao,  reference  7) 
in  the  theory  of  statistical  estimation  shows  that  under 

certain  general  conditions  the  smallest  variance  (and 

unbiased 

hence  greatest  precision)  obtainable  with  any^ estimator 
of  the  parameter  u,  for  a sample  of  size  m,  is  not  less 


than 


2, A 


° ^ u ) min 


il 


m 


From  the  above  it  is  seen  that  when  <r2(uq)  38  o2(u) 


S « 33  0.5820o 

n e - 1 

This  means  that  for  a given  degree  of  precision,  the 
theoretically  smallest  number  of  observations  necessary 


id*  The  term  unbiased  in  statistics  is  generally  reserved  for 
the  case  of  complete  absence  of  bias  in  samples  of  any 
size,  small  or  large.  For  simplicity,  since  the  present 
report  deals  only  with  large  n,  the  teffcr  is  applied  also 
to  estimators  which  are  not  strictly  unbiased  for  finite 
n,  as  long  as  their  bias  disappears  when  n increases 
without  limit. 
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(m  = *5820  n)  is  not  much  over  half  the  number  (n)  re- 
quired by  the  given  estimator  u^.  This  condition  is 
true  generally  — the  actual  number  of  observations  is 
always  greater  than  (or  in  some  cases  equal  to)  the 
theoretically  smallest  number * The  ratio  m/n  thus  has 
values  only  from  0 to  1,  and  may  therefore  bo  used  as 
a measure  of  ”§ff iciency*"  It  is  more  conveniently 
calculated  from  the  ratio  cr^Cu)/ cr  2 (u^ ) formed  for 
m = n,  and  we  write 


(u) 


min 


1 


A 

and  say  that  the  estimator  ui  is  about  58  percent  as 
efficient  as  a theoretically  most  efficient  estimator * 
Can  a more  efficient  estimator  be  obtained?  If  in 

A> 

place  of  55  xn^  above,  we  use  any  other  ordered  sam- 
ple value 


(5)  V = 


then  Hosteller  (reference  3)  has  shown  that 

(6 ) E(uif ) = u® 

(7)  g2(u  ' ) , h+lllkL 

n(f(u3)]2 

where  the  rank 

n®  = Xi  n , 0 < X < 1 

and  u®  is  the  position  of  the  ordinate  which  cuts  off 
the  area  (on  the  left)  under  the  curve  f(x)0 
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By  varying  ^1  it  is  found  by  trial  that 

x^  ~ «2o 


gives  the  smallest  variance  and  therefore  the  greatest 
efficiency,  64.8  percent.  Hence,  theoretically, 

^l'  = z.20n 

gives  a "best"  or  "optimum”  estimator  when  a single  sam= 
pie  value  is  used. 


However,  if  U]_f  is  used  to  estimate  u,  a correction 
must  be  added,  since  by  equation  (6),  U]_f  estimates  not  u 
but  u*  and  is  therefore  biased.  The  value  of  the  bias 
is  defined  as  the  amount  by  which  an  estimator  of  a pa- 
rameter over  or  underestimates  it  (on  the  average L the 
bias  being  positive  or  negative  accordingly.  Using  the 
fact  that  the  area  under  f(x)  up  to  the  ordinate  at  x 
is  F(x),  we  have  from  the  definition  of  u* , equation  (1), 
and  the  definition  of  X^? 


(8 ) F(u? ) s exp  [ -e“  a (u9~u)  ] • X-^  ? 

whence 


(9)  u*  = u + — [-In  (-In  li)  ] 

a 

- u - ( ,47588  $ , 

where  In  denotes  logarithm  to  the  base  e.  The  quantity 
in  brackets  may  be  obtained,  for  example,  by  looking  * 
up  the  value  Xs  * ,20  under  the  column  headed  ^ in 
Table  2 of  the  National  Bureau  of  Standards®  forthcoming 
probability  tables  for  extreme  values  (reference  4), 


Thus  u which  estimates  u* , not  u,  requires  a 
knowledge  of  the  other  parameter  p in  order  to  estimate 


u without  bias.  This  disa&vantc: 


jx  l . * 0.-1  '.iU  u bhe 


greater  efficiency  of  over  u^«  The  less  efficient 

estimator  u^  can  be  used  without  knowledge  of  p*  If  p 

is  known,  then  the  estimator  to  use  is 

* (10)  v_  = x * .4759  p . 

1 .20n 

This  is  unbiased  and  also  most  efficient  among  the  n 
order  statistics.  This  result  is  listed  in  Table  I, 
columns  (1)  - (6),  line  1. 

b.  Use  of  two  or  three  sample  values.  If  instead 
of  one  order  statistic  we  are  willing  to  use  k suitably 
chosen  ones  out  of  the  n,  we  can  estimate  the  mode  by  using 

(11)  u.  = 4 2 X . 

k 1*1  n^ 

The  generalizations  for  equations  (6),  (7),  and  (9)  are, 
respectively, 

k 


(12)  E(dk)  = £ 121  u±  , 


(13) 


a2(u.  ) - I[  2 

V n i-1  £f(»  )]* 


k Xid-x,)  t 

i<J  f(ut)  f(^)J 


(14)  i A V • u XJ  ]P}  , 


where 
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For  k * 2 it  is  found  by  trial  that 
* .08,  Xg  ® *40 

give  optimum  spacing.  After  taking  account  of  the  bias, 
given  by  the  term  in  P in  (14),  the  unbiased  estimator 
to  use  is 

V2  = (L/2Xxs08n  ♦ Xo40n)  + .4074  p , 
with  efficiency  close  to  82  percent  as  shown  in  Table  X, 
columns  (1)  - (6),  line  2. 

Likewise  for  k ® 3,  the  optimum  estimator  is 

'▼3  * k/3^x.05n  + xo20n  + I.45n)  * *4494*  ° 

For  this  statistic  the  efficiency  rises  to  over  88  per- 

- • - ;»  o •;  \ 

cent,  as  shown  in  Table  I,  columns  (1)  - (6),  line  -3. 

% 

Computation  rapidly  becomes  more  complicated  for 
higher  values  of  k.  If  we  were  to  continue  till  k * n 
then  all  jtltffeple  values  would  be  drawn  into  the  eSti- 

^ yv  tiS 

mator  u^  = un.  I.e.,  the  estimator  in  question  would 
then  be  merely  the  sample  mean  corrected  for  bias, 

Un  = u = x - Cp  , 

with  efficiency  .6079,  as  shown  in  the  last  line  of  col- 
umn (6)  of  Table  I,,  Hence  if  we  were  to  compute  optimum 
estimators  for  increasing  values  of  k,  it  is  noteworthy 
that  efficiency  would  not  always  keep  increasing.  la 
fact,  we  must  ultimately  reach  the  efficiency  <>6079  when 
all  n sample  values  have  been  taken,  which  implies  that 
for  some  value  of  k,  the  efficiency  would  reach  its  mari- 
mum  value  and  from  then  on,  the  efficiency  would  actually 
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become  worse,  as  more  sample  values  were  drawn  into  the 
estimator. 

Example  1.  It  is  interesting  to  apply  these  modal 
estimators  to  the  data  given  in  Example  1 of  reference 
1,  which  uses  a different  method.  The  data,  consisting 
of  485  maximum  values  of  gust  velocity,  one  maximum  for 
each  of  485  traverses  of  thunderstorms,  are  tabulated 
in  Table  II  taken  from  that  source.  For  rough  compari- 
son purposes,  the  value  p « 1/a  * 4.8263,  found  in  the 
example  of  reference  1,  is  assumed  here.  The  results  of 
estimation  by  means  of  the  optimum  spacing  procedures 
of  the  present  report  are: 

n = 485 

k = 1;  V!  = X97  + (.4759) (4.8263)  - 10.2258 

+ 2.2968  = 12.5226  (assuming  that  linear 

interpolation  to  find  the  97th  value  is 

valid) 

1 

k = 2:  V2  * <5(x30og  + *194)  + ( .4074)  (4.8263)  * 

1(7.7630  + 13.4483)  + 1.9662  = 12.5719 

k = 3:  v3  = j(*24i#i5  + x97  + x218.25^  + (,*4494)  * 

(4.8263)  = ^(6.4405  + 10.2258  + 14.3000) 

+ 2.1689  = 12.4910  . 

These  values  are  listed  in  Table  III  and  compare  very 
well,  considering  the  saving  in  computing  labor,  with 
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the  value  u = 12.8370,  found  in  reference  1 by  metes 

of  the  formula  u = x - — (derived  by  E.  Jo  G-umbel  in 

u 

Lecture  3,  page  18  of  reference  2),  which  necessitates 
calculating  the  mean  of  all  485  (grouped)  values.  The 
"efficiency” (as  here  defined)  of  this  latter  estimator 
of  u (assuming  l/a  * 3 is  known)  given  by  the  Grumbel 
type  of  approach  can  be  shown  to  be  60.8  percent  — less 
even  than  the  64.8  percent  (Table  I,  column  (6))  that 
can  be  obtained  by  using  only  one  ("optimum")  sample 
value . 

Table  III  shows  that  the  "optimum"  estimates  ob- 
tained from  the  estimators  v^9  y%9  (column  (2)), 

although  reasonably  close  to  that  given  by  Gumbel9s 
estimator  u (column  (5)),  are  all  less  than  tu  Ideally, 
the  various  estimates  should  scatter  among  each  other 
in  random  fashion « The  fact  that  they  do  not  «ay  be 
attributed  to  grouping  (see  remark  below)  or  to  the 
fact  that  the  data  depart  from  an  underlying  extreme® 
value  distribution.  It  is  even  possible,  though  very 
unlikely,  that  the  sample  size  n * 485  is  not  suffici- 
ently large  for  the  normality  assumptions  made  above  to 
be  sufficiently  valid. 

Remark.  The  closeness  of  one  estimate  to  another 
computed  from  the  same  data  by  a different  estimator 
should,  of  course,  be  regarded  merely  as  suggestive, 
rather  than  as  a definite  indication  of  a statistical 
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property  of  the  estimators . The  characteristics  of  esti- 
mators can  he  studied  dependably  only  by  theoretical  methods 
or  by  extensive  experimental  sampling * For  large  sample 
sizes  such  as  in  the  above  example  some  weight  can  be  given 
to  the  numerical  relations  obtained,  but  it  should  always  be 
kept  in  mind  that  these  might  be  upset  through  sampling  vari- 
ation if  another  sample  of  observations  were  taken* 

It  should  also  be  noted  that  since  the  data  are  only 
given  in  grouped  form,  interpolation  is  necessary  in  order 
to  obtain  the  desired  order  statistics*  It  is  likely  that 
more  satisfactory  estimates  would  be  obtained  from  data 
given  in  full  detail*  Data  should  always  be  available  in 
their  original  observed  form,  before  grouping  or  other  pro- 
cessing is  performed* 

The  above  remark  is  relevant  to  all  of  the  examples 
given  below* 

2*  Equidistant  spacing. 

Beside  optimum  spacing,  there  are  two  other  methods  of 
selecting  k statistics  in  a sample  which  have  been  suggested * 
One  of  these,  the  simplest  possible,  is  to  divide  the  ranks 
1,  2,  **e,  n into  equal  groups,  i*e®  by  taking 

M = ITT!  * 

Thus,  for  k = 9,  we  select  the  9 values 
x8ln’  x*2n>  • • • 9 x*9n> 
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the  values  of  which  divide  the  n ranks  into  10  equal  groups 0 

The  equidistant  method  has  the  advantages  that  no  ex- 
tensive trial  calculations  are  involved  and  that  it  is  easier 
to  remember  which  values  to  pick  out  of  a sample*  Moreover, 
the  work  of  computing  the  estimates  is  as  simple  as  finding 
quart iles  or  percentiles 0 

A disadvantage  of  the  method  is  that  it  does  not  depend 
upon  the  form  of  the  population  under  observation*  Exactly 
the  same  numerical  estimates  will  be  obtained  for  the  normal, 
Type  III,  or  practically  any  continuous  population*  Thus, 
conceivably  we  may  lose  a warning  signal  which  would  ordina* 
rily  be  provided  by  an  estimator  sensitive  to  the  form  of 
population,  since  in  that  case  the  estimates  would  be  out  of 
line  with  experience  more  frequently  if  the  observed  data  did 
not  actually  come  from  an  extreme- value  population  than  if 
they  did  come  from  it* 

Another  reason  for  studying  the  method  of  equidistant 
spacing  is  that  it  is  intimately  related  to  certain  plotting 
procedures  advocated  by  E*  J*  Gumbel  (reference  1,  Lecture 
2,  pp*  20-21),  as  will  be  explained  below* 

The  bias  and  efficiency  of  the  equidistant  method  can 
be  obtained  by  the  same  procedures  as  used  in  optimum  spac- 
ing, embodied  in  equations  (11)  to  (15),  with  - i/(k+l)9 
The  results  for  the  first  5 values  of  k are  shown  in  Table  I, 
columns  (7)  and  (8)0  The  method  overestimates  the  parameter 
u by  substantial  amounts,  depending  upon  the  parameter  p 
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(assumed  known),  and  a negative  correction  (colt ran  (7))  is 
necessary  in  each  case,  the  smallest  being  (-®3665p)  for  k»l0 
It  can  be  shown  that  for  increasing  k the  correction  tends 
toward  (-Cp),  where  G = 0* 577215 is  Euler’s  number® 

Efficiency  (column  (8))  of  the  equidistant  method  is 
relatively  low,  starting  from  just  under  one-half  for  one 
order  statistic  (k  = 1)  and  rising  to  about  63  percent  for 
estimation  by  five  order  statistics*  This  is  not  even  as  much 
as  that  obtainable  from  one  order  statistic  under  optimum 
spacing,  which  is  64®8  percent  (column  (6 ))«> 

The  relationship  of  the  equidistant  method  to  Gumbel’s 
plotting  method  is  as  follows®  Gumbel  uses  a special  prob- 
ability graph  paper  (Figure  2)  and  plots  horizontally  the 
values  in  the  sample  of  extreme  observations,  x^,  x2,  . 
arranged  in  increasing  size®  For  each  x^  he  plots  the  corres- 
ponding ratio  i/(N  + 1)  =1  along  a vertical  non-uniform 
probability  scale®  Parallel  to  this  probability  scale  is  a 
uniform  scale  of  the  function 

(16)  yjL  = -ln(-lnki)  * 

The  reasoning  then  proceeds  as  follows.  If  the  data  come 
from  a true  extreme -value  distribution,  then  the  points 
(xi,  y^)  should  be  situated  closely  about  the  straight  line 

(17 ) x = u + ^ = u + py  , 

where  u and  p are  the  parameters  of  the  underlying  extreme- 
value  distribution® 
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This  follows  from  the  definition  of  'the  extreme -value 

distribution*  The  quantity  X,  the  cumulative  probability 

associated  with  the  observed  variable  x,  is  by  equation  (1) 

related  to  x by 

. _e~  a(x-u) 

X = e e ; 

i « e * , 

a(x-u)  = -ln(-lnX)  * 

Hence  the  corresponding  y defined  by  (16)  satisfies 
y = -ln(-lnX)  = a(x-u) 

X . e # , 

x = u+  ^ = u+py, 
a 

if  x is  taken  as  the  variable  whose  values  we  wish  to  predict 0 
The  method  given  for  fitting  the  straight  line  (17)  is 
either  by  eye  or  by  a modification  of  least  squares  which 
allows  the  line  to  pass  through  the  mean  of  the  x*s,  x^,  and 
the  mean  of  the  corresponding  y*s,  y^,  computed  for  the  ob- 
served sample  values.  This  condition  is  given  by  the  rela- 
t ion 

(18)  u = % - pyN 

connecting  the  (estimates  of f the-)  parameters  u,  (3, 

Reference  to  equation  (14)  shows  that  this  (sample)  mean 
of  the  y*s,  namely,. 

(19)  yN  = N ^ y i = N C -lnt-lnX^)]  = b^  , 

i=l  i=l 

X = 1 

i N + 1 , 
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is  precisely  the  quantity  in  curly  brackets  in  (14),  if  N is 
put  for  k9  The  Yalu.es  of  this  quantity  hare  already  been 
computed  in  connection  with  the  equidistant  method  and  are 
giY@n  as  the  coefficients  of  («g ) in  Table  I,  column  (T ),  and. 
designated  by  bjjo  The  quantity  b^  32  approaches  Euler ? s 
number,  C,  as  N becomes  infinite 0 The  equidistant  method  is 
thus  essentially  equivalent  to  that  of  Gumbel*  s plotting  po~ 
sitions,  with  the  difference  that  sample  size  N for  Gumbal 
corresponds,  not  to  sample  size  n for  the  equidistant  method, 
but  to  the  number  k of  order  statistics  selected  out  of  the  ru 

These  considerations  make  it  possible  to  find  the  bias 
Gurnbel8 s method  considered  as  providing  an  estimator  of 
the  mode  u given  by  equation  (18),  namely, 
u s % - pyN  « xN  - (3%  o 

The  bias  is 

(20)  Bjj  = E (u)  - u - E(xh)  - j3bI  - u **  (u  + Cp)  - (0 

since  the  expected  value  of  the  sample  mean  is  the  same  as  the 
population  mean,  namely, 

(21)  E(x)  = u + CP  o 

Example  2o  Using  the  same  data  as  in  example  1,  we  ob- 
tain the  results  derived  In  Table  . XT 0 values 

compared  with  other  estimates  in  Table  I1I0  It  appears  that 
the  estimates  given  by  equal  spacing  differ  from  those  given 
by  the  sample  mean  by  roughly  the  same  order  of  magnitude  (about 
02  to  06)  as  do  the  estimates  given  by  optimum  spacing  in  Example 
10  This  time,  however,  the  estimates  are  all  greater  than  that 
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given  by  the  sample  mean,  whereas  in  Example  1 they  were  all 
less.  (See  also  discussion  and  Remark  under  Example  1. ) 

3 . Method  of  expected  values. 

The  last  method  of  spacing  which  will  be  presented  acts 
as  a compromise,  combining  some  of  the  features  of  the  other 
two.  This  method,  based  on  expected  values,  avoids  the  ex- 
tensive computation  of  optimum  spacing  by  making  use  of  a 
special  table  and,  unlike  the  method  of  equal  spacing,  is 
sensitive  to  the  form  of  population  from  which  the  data  are 
assumed  taken. 

The  method  of  expected  values  is  as  follows.  The  variate 
values  uj_  in  (15)  are  chosen  equal  to  the  expected  values  of 
the  order  statistics  of  a sample  of  size  k from  the  two- 
parameter  population  of  largest  values.  The  relation  (15) 
then  determines  the  spacing  Xi» 

If  XjL'jj.  is  the  i^  order  statistic  in  a sample  of  k from 
the  extreme-value  distribution  with  parameters  u,  (3,  then  its 
expected  value  is 

E<xi,k)  = u + E(yi,k)P  , 

where  E (y^  k)  is  the  expected  value  for  the  distribution  whose 
parameters  are  u = 0,  (3  = 1,^0  The  values  of  E(yi'k)  have  been 
computed  by  the  National  Bureau  of  Standards  in  a table  (ref- 
erence 6)  prepared  at  the  suggestion  of  B«  E*  Kimball,  for 

’Taj'This  follows  from  the  fact  that,  by  equation  (1),  if  x 

is  the  variate  in  the  population  F(x)  - exp[  -e“a(x~u)]  9 
and  y the  variate  in  Eq(y)  = exp(-e“-y),  which  is  the 
population  fVwith  parameters  u = 0,  (3  =1,  thenx  and  y 

are  related  byx  = u + I = u+  y(3 , 

a 
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i = 1(1  )n  or  25,  whichever  is  smaller,  and  n = 1(1)10(5)60 
(10 )100s  . From  the  definition  of  the  expected  value  method 


we  have 

i ' 

Ui  - E(xijk)  = u + . 

For  use  presently  we  notice  that  from  this  relation  and 
equation  (12)  we  have 


s(S]d  = 


2 J Ui  = E(  2 4 Zi,k) 

1=1  1=1 


= E{xk), 


where  is  the  ordinary  mean  of  a sample  of  size  k0  Since 
the  expected  value  of  the  sample  mean  is  always  the  popula- 
tion mean,  we  have  by  equation  (21) 

(22)  E(Qk)  = E(x)  » u + Cf3  . 

Returning  to  determination  of  X^,  we  have  from  equations 
(1)  3 (15),  and  the  relation  a (3  = 1 , 

= F(ujJ  = exp  [ ~e~  ^ui“U)]  ^ exp  (-e“E^i,k) ) 0 

Since  Efyi^'k)  are  known,  the  values  X^  can  be  looked  up  in  a 
table  of  the  extreme-value  cumulative  distribution  exp (-©“7) 
such  as  Table  1 of  reference  4a 

Determination  of  theX^  for  the  expected  value  method 
is  shown  in  Table  ¥,  and  the  biases  and  efficiencies  of  the 
,xsethoc,  in  Table  I,  columns  (9)  and  ( 10 ) 0 If  p is  known, 
then  the  method,  for  practical  purposes,  is  unbiased,  ^ince 
tie  bias,  Cp , and  the  correction,  -Cp  = -0a5772p,  are  always 
the  same  known  values  for  any  kG  This  follows  ehsily  from 
equation  (22),  since  the  bias  of  in  estimating  u is,  by 
definition, 
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E(uk)  - u = C(3 

The  efficiencies  of  the  expected  values  method  are  even 
lower  than  for  equidistant  spacing,  starting  from  about  42 
percent  for  estimation  by  a single  order  statistic  and  being 
only  57  percent  for  estimation  by  five  statistics. 

The  method  of  expected  values  bears  the  same  relation 
to  a graphical  method  advocated  by  Kimball  as  does  the  equi- 
distant method  to  the  plotting  procedure  of  Gumbel.  Kimball ffs 
method  consists  in  modifying  Gumbel's  graphical  procedure  by 
simply  using  for  the  values  given  by  the  method  of  expected 
values  instead  of  the  simple  ratios  i/(k  + 1),  and  leaving 
everything  else  unchanged.  The  result  is  to  replace  equation 
(19)  by 

, N , N 

V =1  ±2=1  71  =1  2=1  , 

which  has  been  shown  equal  to  C.  Hence  the  bias  in  Kimball’s 
method  is 

%*  = (C  - C)P  = 0 ; 

i.e.,  his  method  has  the  theoretical  merit  of  giving  an  un- 
biased position  for  the  centroid  of  the  fitted  line. 

Example  3.  Again  using  the  data  of  Example  1,  we  perform 
the  calculations  for  the  expected  values  method  as  in  Table 
IV,  the  results  of  which  are  listed  in  Table m for  comparison. 
It  appears  that  this  method  gives  estimates  about  10  percent 
closer  to  ti  given  by  the  sample  mean  than  the  equidistant 
method  gives.  Again,  the  values  are  all  higher  than  for  the 
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sample  mean0  (Compare  Example  !<>) 

If  one  of  these  two  systematic  methods  of  spacing  order 
statistics  is  used,  it  appears  that  at  least  3 should  he 
taken,  since  the  last  two  examples  indicate  that  k = 1 and 
2 give  relatively  widely  discrepant  values  as  compared  with 
larger  values  of  kD  This  precaution,  however,  does  not  seem 
to  he  needed  with  optimum  spacing 0 

VI.  ESTIMATION  OF  DISPERSION  PARAMETER  p 
10  Use  of  two  sample  values . 

Hosteller  (reference  3)  has  discussed  the  use  of  the 
quasi-range 

P2  “ fcn2  “ *ni)/c 

to  estimate  the  standard  deviation  of  a normal  population 
from  a sample  of  ne  This  estimator  may  also  he  used  in  the 
extreme-value  case  for  estimating  p,  provided  £ is  chosen  as 
c = u2  - ui  , 

where  u^,  U2  are  defined  by 

F(uj.)  = xii/n  , i = 1,  2, 

as  in  equation  (15)fl 

Again  using  trial  methods,  we  find  the  optimum  spacing 
to  he  given  hy 

^ =v  *03,  Xrg  j=  , 

with  e 33  3 0 07159*  Hence  for  two  sample  values,  the  optimum 
estimator  for  p is 

(23)  35  0.3256 (xo85n  - Xo03n)  , 
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and  its  efficiency  ^I(as  defined  earlier)  is  59c5  percent,, 

A 

Two  points  are  worth  noting  about  p 2 o First,  it  is  unbiased, 
bias  being  taken  care  of  by  the  numerical  factor 0 Secondly, 
unlike  the  estimator  for  u,  the  estimator  for  p does  not  de- 
pend on  a knowledge  of  the  other  parameter  ue 
20  Use  of  four  sample  values 0 

The  above  result  for  two  sample  values  can  be  improved 
by  taking  two  additional  values*  While  exact  determination 
has  not  been  carried  out,  a considerable  amount  of  trial  in- 
dicates that  probably  the  best  efficiency  for  four  values  is 
reached  by  using  the  estimator 

P4  = 0.2026  (x.85n  + x.70n  - xa0n  - x<03n)  c 

Its  efficiency  is  68*9  percent*  As  for  the  case  of  two 
values,  the  numerical  factor  eliminates  the  bias,  and  use  of 
the  estimator  does  not  require  a knowledge  of  the  other 
parameter  u„ 

Example  4.  If  we  use  the  same  data  as  in  Example  1,  the 
above  two  estimators  of  p give  the  following  results? 

2 values:  $2  = 0.3256(x412o25  “ x14.55) 

= 0.3256  (21.4583  - 3.9182)  = 5.7111 
4 values:  p4  = 0.2026 (x412.25  -*•  X330.5S-  x48.5  ~ x14.55) 

= 0.2026(21,4583+  18.4754-  8.2708-3,9182) 

= 5.6211 

M In  the  case  where  both  parameters  are  unknown,  the  con- 
cept of  efficiency  defined  earlier  for  the  one-parameter 
case  is  not  strictly  applicable*  See  section  YII  below 
for  further  comments. 
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These  values  do  not  seem  to  be  in  as  good  agreement  with  the 

value  p = l/a  * 4C82S3,  found  In  reference  1 (by  means  of 

jfcr 

the  formula  l/a  = - ° b given  by  Gumbel)  as  is  the  agreement 
between  the  order-statistics  estimates  of  u and  the  value  of 
u found  in  reference  1 by  use  of  the  sample  meanc  This  is 
not  too  surprising  in  view  of  the  much  greater  reduction  of 
computing  work  through  avoiding  calculation  of  the  standard 
deviation  as  compared  with  the  reduction  by  avoiding  the 
mean,  ^ince  ordinarily  it  is  to  be  expected  that  more  infor- 
mation is  lost  when  the  reduction  in  labor  is  greater,  other 
things  being  equal 0 

For  comparison,  the  efficiency  of  the  Gumbel  estimator 


which  is  {asymptotically)  unbiased,  is  approximately  39  per- 
cent o This  value  is  obtained  on  the  assumption  that  the 
sample  size  n ® 485  is  large  enough  so  that  the  first  one  or 
two  terms  in  a Taylor  series  expansion  of  s furnish  a good 
approximation®  This  assumption  has ’been  found  to  give  usable 
results  in  similar  situations 0 Determination  of  its  validity 
depends  upon  the  exact  evaluation  of  the  variance  of  s9  which 
requires  a prohibitive  amount  of  numerical  integration  by 
ordinary  methods  * CD1 


UJ  Special  sampling  procedures  in  process  of  development 
give  promise  of  shortening  the  labor  to  within  feasible 
limit s0 
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VII.  ESTIMATION  OE  MODE  WHEN  BOTH  PARAMETERS  ARE  UNKNOWN 

When  both  parameters  are  unknown,  one  might  seek  to  try 
a combination  of  the  above  methods.  One  would  first  estimate 
p by  the  quasi-range  with  two  or  four  order  statistics  and 
then  use  this  value  in  conjunction  with  one  of  the  estimators 
for  u discussed  previously.  A complete  discussion  of  this 
method  would,  however,  involve  the  question  of  joint  estima- 
tion of  two  unknown  parameters  and  is  outside  the  scope  of 
this  report.  However,  one  simple  case  will  be  touched  on 
briefly. 

We  can  try  a combination  of  the  most  efficient  single 
sample  value  with  a multiple  of  the  difference  of  two  of  the 
sample  values.  The  estimator  is,  from  (10)  and  (23), 


This  estimator  is  unbiased  since  P2  unbiased.  Its  vari- 
ance is,  by  a modification  of  equation  (13), 


(24) 


u*  = x.20n  + *47588132 

= x.20n  + «15493<x.85n  - x.03n)  • 


+ 2(  .15493)  [ 


X2(l  - Xg)  ^ - x2)  ^ 


f f 
2 3 


1 3 


$ 
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where  — .03,  = ^3  = *85,  and  the  other  quantities 
are  as  defined  in  (13)  and  (15)  above.  When  evaluated,  this 
gives 


Y(u)  = 1.7423p2/n  . 

The  corresponding  Gumbel  estimator  of  u,  when  (3  is  unknown,  is 

•/& 


& — 
*G  * x “ 


% 


Cs 


where  s is  the  sample  standard  deviation.  As  in  the  case  of 

A 

{3,  the  Gumbel  estimator  of  (3,  the  exact  calculation  of  the 
variance  Uq.  may  involve  prohibitive  labor.  The  approximation 

A 

method  used  in  connection  with  p gives 


V{u G)  = 1.4042j32/n  . 

The  concept  of  efficiency  previously  used,  where  only 
one  parameter  was  unknown,  is  not  directly  applicable  to  the 
case  of  more  than  one  unknown  parameter,  but  involves  the 
concept  of  joint  estimation  mentioned  above.  Instead,  we 
may  proceed  as  follows. 

In  the  one-parameter  case,  'two  different  unbiased  esti- 
mators  u^,  U£,  of  one  of  the  parameters,  say  u,  were  compared 
by  dividing  the  variance  of  each  into  a common  value,  namely, 
the  theoretically  smallest  value  (under  certain  conditions) 
which  the  variance  of  any  (unbiased)  estimator  of  u could 
have.  This  furnished  for  each  estimator  an  index,  called 
efficiency,  of  the  form 


eff (u  ) 

i 


g^(u)min 

as(u  ) 
i 


i * 1,  2a 


9 
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Instead  of  evaluating  each  fraction  separately,  we  could 

A A 

compare  the  efficiencies  of  u^  and  U2  by  simply  taking  their 
ratio, 


eff  (ui)  _o*2( u2) 
eff  ^2)  ~<r2(^l) 


say, 


0 < k < 1, 


A 

and  say  that  the  estimator  uq  is  k times  as  efficient  as  the 

A 

estimator  u2. 

In  the  two-parameter  case  we  can  also  compare  estimators, 
in  similar  fashion,  by  taking  the  ratio  of  their  variances,  as 
k above,  and  designating  this  ratio  the  "relative  efficiency" 
of  uq  relative  to  u2,  with  the  understanding  that  the  word 
"efficiency"  will  not  necessarily  have  the  same  meaning  as  in 
the  one-parameter  case. 

We  then  have,  subject  to  the  approximations  described 
above  in  connection  with  the  estimator  $, 

V(up) 

k = = 0*8060  e 

That  is,  the  estimator  of  u constructed  from  order  statistics 
is  about  81  percent  as  "efficient"  as  the  estimator  given  by 
G-umbel,  in  the  case  where  both  parameters  are  unknown*  Thus 
the  order  statistic  estimator  uf  can  be  used  for  rapid  calcu- 
lation without  too  great  a loss  of  accuracy  over  that  of  the 
estimator  of  Gumbel  which  requires  computation  of  sums  of 
squares* 

This  discussion  is  for  p unknown*  If  (3  is  known,  then  u* 
should  be  replaced  by  one  of  the  more  efficient  estimators 
listed  in  Table  I* 
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Example  5,  Applying  the  estimator  in  (24)  to  the  same 
data  as  used  before  with  (3  now  unknown  gives,  from  the  results 
of  examples  1 and  4 above, 

uf  = 10.2258  + .47588  (5.7111) 

= 12.9436 
p2  = 5.7111 . 

The  estimate  of  u is  remarkably  close  to  that  obtained  by 
Gumbel's  formula,  12.8370,  but  does  not  require  the  labor  of 
squaring,  adding,  and  taking  a square  root.  The  estimate  of 
(3  is  of  course  the  same  as  in  example  4. 

VIII.  CONCLUSIONS 

Several  large-sample  unbiased  estimators  of  the  parame- 
ters of  an  extreme- value  distribution  have  been  developed 
which,  with  one  exception,  appear  to  have  greater  efficiency 
than  those  derived  from  methods  of  E.  J0  Gumbel  and  B.  F. 
Kimball  and  yet  require  much  less  effort  in  computation,  in- 
volving essentially  a mechanical  sorting  to  find  pre- desig- 
nated ranked  values  in  a sample  of  size  n. 

The  unbiased  estimators  which  can  be  recommended  are  the 
following  linear  functions  of  order  statistics,  where,  for 
example,  X9Qzn  means  the  (-ilLj'kk  observation  in  the  sample 
when  all  are  arranged  in  ascending  order,  and  t!@st.  u”  means 
"estimator  of  u,"  the  subscripts  being  omitted  for  simplicity. 
Application  to  a sample  of  n = 485  maximum  gust-velocity 
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observations  analyzed  in  NACA  TN  1926  (reference  1)  has  been 


found  to  give  satisfactory  re suit s* 

1.  For  estimating  the  mode  u when  (3  38  l/a  is  known®  Uses 

(a)  estoU  = xe20n  + °4759  P®  Efficiency  = 65$ 

(b)  est,u  = ~(xeQ8n  + x«40n)  + ®4074p®  Efficiency  = 82$ 

(c)  est.  u = i(xo05n  + x.20n  + x,45n) 

+ o 4494p  » Efficiency  = 88j$ 


By  comparison,  estimation  by  using  the  mean  of  all  n values, 
as  in  the  Gumbel  method,  is  about  61  percent  efficient® 

2*  For  estimating  u when  p is  unknown.  Use: 
est8  u = x#20n  + °1549  (xog5n“  xo03n^  0 
This  estimator  is  81  percent  as  "efficient"  as  the  estimator 
given  by  Gumbel  which  involves  the  sample  mean  and  standard 
deviation,  but  avoids  the  work  of  squaring  and  summing® 

3o  For  estimating  3 whether  or  not  u is  known.  Use: 

(a)  estop  - «3256  (xo85n  ~ xo03n^®  Efficiency  = 59^$ 

(b)  est„(3  = .2026  (x.85n  + Xo70n 

-x.10n  " x.03n)*  Efficiency  = 69% 

These  a :ree  less  well  with  the  estimator  requiring  the  stan- 
dard deviation,  but  the  efficiency  of  the  latter  is  indicated 
to  be  about  39  percent® 

Other  methods  of  choosing  the  order  statistics  which 
were  examined  may  have  some  theoretical  advantage  and  perhaps 
be  somewhat  simpler,  but  they  have  substantially  lower  effi- 
ciency than  those  recommended® 
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Finally,  it  should  be  urged  that  when  data  are  presented 
for  analysis,  they  should  be  given  in  their  original  detailed 
form,  before  grouping  or  other  processing.  This  would  obvi- 
ate the  need  for  interpolation  or  other  devices  which  may 
tend  to  vitiate  the  accuracy  of  the  analysis. 


Statistical  Engineering  Laboratory 
National  Bureau  of  Standards 
Washington,  D.  G. 


15  August  1951 
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TABLE  I 

Comparison  of  three  types  of  methods  of  estimating  the  mode  u of  the  distribution  of 
largest  values:  (i)  use  of  k order  statistics  spaced  in  three  different  ways,  p (=  A 
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TABLE  II 


Distribution  of  n = 485  maximum  gust  velocities 


Class  interval 
in  ft. /sec. 

Frequency 

Cumulative 
frequency  from 
smallest 

2 to  4 

4 

4 

4 to  6 

11 

15 

6 to  8 

27 

42 

8 to  10 

48 

90 

10  to  12 

62 

152 

12  to  14 

58 

210 

14  to  16 

55 

265 

16  to  18 

60 

325 

18  to  20 

61 

386 

20  to  22 

36 

422 

22  to  24 

17 

439 

24  to  26 

18 

457 

26  to  28 

n 

O 

465 

28  to  30 

7 

472 

30  to  32 

6 

478 

32  to  34 

3 

481 

34  to  36 

1 

482 

36  to  38 

2 

484 

38  to  40 

1 

485 

485 

Source:  Table  2 of  reference  1 

(1946  Thunderstorm  Project  data) 
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TABLE  III 


Estimates  of  mode  u obtained  by  various  methods 
from  a sample  of  n = 485  extremes 


p = 4.8263 


k 

(1)  ... 

for  three  methods  of  spacing 
lc  order  statistics 

Estimation 

by 

sample  mean 
u - x « CP 

.(5) 

"Optimum” 

(2) 

Equidistant 

.(3) 

Expected  values 
(4) 

[ 

1 

12.5226 

13.4130 

13.6008 

2 

12.5719 

13.1876 

12.9896 

3 

12.4910 

13.0767 

12.9594 

4 

13.0114 

12.9008 

5 

13.0031 

12.9794 

n = 485 

• 

12.8370 

Source  of  basic  data:  NACA  TN  1926,  Table  II 


' 

* 


Calculations  for  estimating  mode  u from  sample  of  n = 485  extremes 
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NOTE:  For  comparison,  the  estimate  of  u given  by  the  sample  mean  is  u = x - C(3  » 12.8370 
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TABLE  V 

Calculation  of  spacing  constants 
for  expected  values  method 


k 

A 

E(yi>k) 

i = 1 

i = 2 

i = 3 

■M- 

it 

•H 

i = 5 

1 

0.5772 

2 

-0.1159 

1.2704 

3 

-0.4036 

0.4594 

1.6758 

4 

-0.5735 

0.1061 

0.8128 

1.9635 

5 

-0.6902 

-0.1069 

0.4256 

1.0709 

2.1867 

k 

= exp  [-e 

-E(yi ,k)] 

i — i 

ii 

•H 

CM 

II 

•H 

i = 3 

i = 4 

II 

CJI 

1 

0.5703 

2 

0.3253 

0.7551 

3 

0.2238 

0.5316 

0.8292 

4 

0.1696 

0.4068 

0.6416 

0.8689 

5 

0.1389 

0.3286 

0.5202 

0.7097 

0.8937 

)) 

Source : 


See  text,  Section  V,  3 
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Figure  2 


Specimen  of  extreme-value  probability  graph  paper 
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